35 research outputs found

    A Modeling Approach based on UML/MARTE for GPU Architecture

    Get PDF
    Nowadays, the High Performance Computing is part of the context of embedded systems. Graphics Processing Units (GPUs) are more and more used in acceleration of the most part of algorithms and applications. Over the past years, not many efforts have been done to describe abstractions of applications in relation to their target architectures. Thus, when developers need to associate applications and GPUs, for example, they find difficulty and prefer using API for these architectures. This paper presents a metamodel extension for MARTE profile and a model for GPU architectures. The main goal is to specify the task and data allocation in the memory hierarchy of these architectures. The results show that this approach will help to generate code for GPUs based on model transformations using Model Driven Engineering (MDE).Comment: Symposium en Architectures nouvelles de machines (SympA'14) (2011

    An autoadaptative limited memory Broyden’s method to solve systems of nonlinear equations

    Get PDF
    International audienceWe propose a new Broyden-like method that we call autoadaptative limited memory method. Unlike classical limited memory method, we do not need to set any parameters such as the maximal size, that solver can use. In fact, the autoadaptative algorithm automatically increases the approximate subspace when the convergence rate decreases. The convergence of this algorithm is superlinear under classical hypothesis. A few numerical results with well-known benchmarks functions are also provided and show the efficiency of the method

    Programming Massively Parallel Architectures using MARTE: a Case Study

    Get PDF
    International audienceNowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics Processing Unit), have led the race of floating-point performance since 2003. While the performance improvement of general-purpose microprocessors has slowed significantly, the GPUs have continued to improve relentlessly. As of 2009, the ratio between many-core GPUs and multicore CPUs for peak floating-point calculation throughput is about 10 times. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Aiming to improve the use of many-core processors, this work presents an case-study using UML and MARTE profile to specify and generate OpenCL code for intensive signal processing applications. Benchmark results show us the viability of the use of MDE approaches to generate GPU applications

    Component-based Models Going Generic : the MARTE Case-Study

    Get PDF
    One of the reasons for using component-based modeling is to improve on reusability. However, there are cases where a whole component cannot be reused just because one element from its internal structure does not present the required features (e.g., type, multiplicity, etc). In this paper, we propose the use of parameterized components as a way to address this problem - and thus to get a further boost on reusability. The UML specification provides support to parameterization via templates. However, when it comes to component-based modeling, UML is but the first metamodel in sometimes long chains of transformations, comprising other domain metamodels. So, in order to keep parameters deeper down the transformation chains, we introduce generic components in those metamodels. However, instead of changing the target metamodel, we decided to create an independent metamodel with the additional concepts required by parameterization, so it can be attached to any target metamodel. The most obvious advantage of this approach is that we do not have to touch the target metamodel. We also demonstrate how existing transformations can be easily adapted to accept the parameter-related concepts. To illustrate our ideas, we used OMG's MARTE metamodel for real-time and embedded systems. The approach has been validated through transformations written in QVT

    Automatic Multi-GPU Code Generation applied to Simulation of Electrical Machines

    Get PDF
    The electrical and electronic engineering has used parallel programming to solve its large scale complex problems for performance reasons. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Thus, in order to reduce design complexity, we propose an approach to generate code for hybrid architectures (e.g. CPU + GPU) using OpenCL, an open standard for parallel programming of heterogeneous systems. This approach is based on Model Driven Engineering (MDE) and the MARTE profile, standard proposed by Object Management Group (OMG). The aim is to provide resources to non-specialists in parallel programming to implement their applications. Moreover, thanks to model reuse capacity, we can add/change functionalities or the target architecture. Consequently, this approach helps industries to achieve their time-to-market constraints and confirms by experimental tests, performance improvements using multi-GPU environments.Comment: Compumag 201

    Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications

    Get PDF
    Graphics Processor Units (GPUs) are known for offering high per- formance and power efficiency for processing algorithms that suit well to their massively parallel architecture. Unfortunately, as parallel programming for this kind of architecture requires a complex distribution of tasks and data, developers find it difficult to implement their applications effectively. Although approaches based on source-to-source and model-to-source transformations have intended to provide a low learning curve for parallel programming and take advantage of architecture features to create optimized applications, the programming re- mains difficult for neophytes. A Model Driven Engineering (MDE) approach for GPU intends to hide the low-level details of GPU programming by automati- cally generating the application from the high-level specifications. However, the application designer should take into account some adjustments in the source code to achieve better performance at runtime. Directly modifying the gen- erated source code goes against the MDE philosophy. Moreover, the designer does not necessarily have the required knowledge to effectively modify the GPU generated code. This work aims at improving performance by returning to the high-level models, specific execution data from a profiling tool enhanced by smart advices from an analysis engine. In order to keep the link between exe- cution and model, the process is based on a traceability mechanism. Once the model is automatically annotated, it can be re-factored by aiming performance on the re-generated code. Hence, this work allows us keeping coherence between model and code without forgetting to harness the power of GPUs. To illustrate and clarify key points of this approach, an experimental example taking place in a transformation chain from UML-MARTE models to OpenCL code is provided.Graphics Processor Units (GPU) sont connus pour offrir de hautes performances et d'efficacitĂ© Ă©nergĂ©tique pour les algorithmes de traitement qui conviennent bien Ă  leur architecture massivement parallĂ©le. Malheureusement, comme la programmation parallĂ©le pour ce type d'architecture exige une distribution complexe des tĂąches et des donnĂ©es, les dĂ©veloppeurs ont des difficultĂ©s Ă  mettre en oeuvre leurs applications de maniĂšre efficace. Bien que les approches basĂ©es sur les transformations source-to-source et model-to-source ont pour but de fournir une basse courbe d'apprentissage pour la programmation parallĂ©le et tirer parti des fonctionnalitĂ©s de l'architecture pour crĂ©er des applications optimisĂ©es, la programmation demeure difficile pour les nĂ©ophytes. Une approche Model Driven Engineering (MDE) pour le GPU a l'intention de cacher les dĂ©tails de bas niveau de la programmation GPU en gĂ©nĂ©rant automatiquement l'application Ă  partir des spĂ©cifications de haut niveau. Cependant, le concepteur de l'application devrait tenir compte de certains ajustements dans le code source pour obtenir de meilleures performances Ă  l'exĂ©cution. Modifiant directement le code source gĂ©nĂ©rĂ© ne fait pas partie de la philosophie MDE. Par ailleurs, le concepteur n'a pas forcĂ©ment les connaissances requises pour modifier efficacement le code gĂ©nĂ©rĂ© par le GPU. Ce travail vise Ă  amĂ©liorer la performance en revenant aux modĂšles de haut niveau, les donnĂ©es d'exĂ©cution spĂ©cifiques Ă  partir d'un outil de profilage amĂ©liorĂ©e par des conseils intelligents d'un moteur d'analyse. Afin de maintenir le lien entre l'exĂ©cution et le modĂšle, le processus est basĂ© sur un mĂ©canisme de traçabilitĂ©. Une fois le modĂšle est automatiquement annotĂ©, il peut ĂȘtre repris en visant la performance sur la rĂ©utilisation du code gĂ©nĂ©rĂ©. Ainsi, ce travail nous permet de garder la cohĂ©rence entre le modĂšle et le code sans oublier d'exploiter la puissance des GPU. Afin d'illustrer et de clarifier les points clĂ©s de cette approche, nous fournissons un exemple se dĂ©roule dans une chaĂźne de transformation Ă  partir de modĂ©les UML- MARTE au code OpenCL

    A Deflated Version of the Conjugate Gradient Algorithm

    Get PDF
    International audienceWe present a deflated version of the conjugate gradient algorithm for solving linear systems. The new algorithm can be useful in cases when a small number of eigenvalues of the iteration matrix are very close to the origin. It can also be useful when solving linear systems with multiple right-hand sides, since the eigenvalue information gathered from solving one linear system can be recycled for solving the next systems and then updated

    An MDE Approach for Automatic Code Generation from MARTE to OpenCL

    Get PDF
    Advanced engineering and scientific communities have used parallel programming to solve their large scale complex problems. Achieving high performance is the main advantage for this choice. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Thus, in order to reduce design complexity, we propose an approach to generate code for OpenCL API, an open standard for parallel programming of heterogeneous systems. This approach is based on Model Driven Engineering (MDE) and Modeling and Analysis of Real-Time and Embedded Systems (MARTE) standard proposed by Object Management Group (OMG). The aim is to provide resources to non-specialist in parallel programming to implement their applications. Moreover, concepts like reuse and platform independence are present. Since we have designed an application and execution platform architecture, we can reuse the same project to add more functionalities and/or change the target architecture. Consequently, this approach helps industries to achieve their time-to-market constraints. The resulting code, for the host and compute devices, are compilable source files that satisfy the specifications defined on design time.L'ingĂ©nierie avancĂ©e et les communautĂ©s scientifiques utilisent souvent la programmation parallĂšle pour rĂ©soudre leurs problĂšmes complexes de grande envergure. Atteindre la haute performance est le principal avantage de ce choix. Toutefois, comme la programmation parallĂšle nĂ©cessite une distribution non-trivial de tĂąches et de donnĂ©es, les dĂ©veloppeurs ont du mal Ă  mettre en Ɠuvre leurs applications de maniĂšre efficace. Ainsi, afin de rĂ©duire la complexitĂ© de conception, nous proposons une approche pour gĂ©nĂ©rer du code pour la API OpenCL, un standard ouvert pour la programmation parallĂšle de systĂšmes hĂ©tĂ©rogĂšnes. Cette approche est basĂ©e sur IngĂ©nierie DirigĂ©e par les ModĂšles (IDM) et de Modeling and Analysis of Real-Time and Embedded Systems (MARTE) norme proposĂ©e par l'Object Management Group (OMG). L'objectif est de fournir des ressources pour les non-spĂ©cialistes de la programmation parallĂšle pour dĂ©vĂ©loper leurs applications. En outre, des concepts tels que la rĂ©utilisation et l'indĂ©pendance de plateforme sont prĂ©sents. Ainsi, une fois que nous avons conçu une application et architecture de la plateforme d'exĂ©cution, nous pouvons rĂ©utiliser le mĂȘme projet pour ajouter plus de fonctionnalitĂ©s et/ou de modifier l'architecture cible. Par consĂ©quent, cette approche aide les industries Ă  atteindre leurs contraintes de time-to-market. Le code rĂ©sultant, pour l'hĂŽte et les unitĂ©s de calcul, sont des fichiers source compilable qui satisfont aux spĂ©cifications dĂ©finies dans la conception

    Using ArrayOL to Identify Potentially Shareable Data in Thread Work-Groups of GPUs

    Get PDF
    International audienceOver recent years, using Graphics Processing Units (GPUs) has become as an effective method for increasing the performance of many applications. However, these performance beneïŹts from GPUs come at a price. First, extensive programming expertise and intimate knowledge of the underlying hardware are essential for gaining good speedups. Second, the expressibility of GPU-based programs are not powerful enough to retain the high-level abstractions of the solutions. Although the programming experience has been signiïŹcantly improved by existing frameworks like CUDA and OpenCL, it is still a challenge to effectively utilise these devices while still retaining the programming abstractions. To this end, performing a model-to-source transformation, whereby a highlevel language is mapped to CUDA or OpenCL, is an attractive option. In particular, it enables to harness the power of GPUs without any expertise on the GPGPU programming. In this work, we purpose an approach based on MDE and ArrayOL to detect shareable data zone. The tilers from ArrayOL, which allow express the data parallelism from repetitive tasks, are analyzed in time compilation to create areas of shared data. The identification of these areas is crucial to allow us loading data on shared areas of memory that have high throughput. Consequently, programs automatically generated shall have performances comparable to manually well written programs
    corecore